Machine learning predictive models for grading bronchopulmonary dysplasia: umbilical cord blood IL-6 as a biomarker

Objectives This study aimed to analyze the predictive value of umbilical cord blood Interleukin-6 (UCB IL-6) for the severity-graded BPD and to establish machine learning (ML) predictive models in a Chinese population based on the 2019 NRN evidence-based guidelines. Methods In this retrospective analysis, we included infants born with gestational age <32 weeks, who underwent UCB IL-6 testing within 24 h of admission to our NICU between 2020 and 2022. We collected their medical information encompassing the maternal, perinatal, and early neonatal phases. Furthermore, we classified the grade of BPD according to the 2019 NRN evidence-based guidelines. The correlation between UCB IL-6 and the grades of BPD was analyzed. Univariate analysis and ordinal logistic regression were employed to identify risk factors, followed by the development of ML predictive models based on XGBoost, CatBoost, LightGBM, and Random Forest. The AUROC was used to evaluate the diagnostic value of each model. Besides, we generated feature importance distribution plots based on SHAP values to emphasize the significance of UCB IL-6 in the models. Results The study ultimately enrolled 414 preterm infants, with No BPD group (n = 309), Grade 1 BPD group (n = 73), and Grade 2–3 BPD group (n = 32). The levels of UCB IL-6 increased with the grades of BPD. UCB IL-6 demonstrated clinical significance in predicting various grades of BPD, particularly in distinguishing Grade 2–3 BPD patients, with an AUROC of 0.815 (95% CI: 0.753–0.877). All four ML models, XGBoost, CatBoost, LightGBM, and Random Forest, exhibited Micro-average AUROC values of 0.841, 0.870, 0.851, and 0.878, respectively. Notably, UCB IL-6 consistently appeared as the most prominent feature across the feature importance distribution plots in all four models. Conclusion UCB IL-6 significantly contributes to predicting severity-graded BPD, especially in grade 2–3 BPD. Through the development of four ML predictive models, we highlighted UCB IL-6's importance.


Introduction
Bronchopulmonary dysplasia (BPD) is a chronic respiratory disorder originating in the neonatal period and is a major complication among premature infants.BPD results in long-term pulmonary issues, imposing significant economic burdens on both society and families (1).With advancements in neonatal care technology, more premature infants are able to survive, leading to a gradual increase in the global incidence of BPD (2).Effective treatment strategies for BPD are currently lacking (3).The prognosis for newborns with different grades of BPD varies, with a higher likelihood of mortality and neurological damage observed in those with more severe BPD (4).Assessing the severity-graded BPD usually happens late, either by 36 weeks of postnatal menstrual age (PMA) or upon discharge home.This isn't conducive for guiding treatment and devising effective strategies.Hence, early prediction of BPD grade is vital, making the exploration of predictive factors for early grading BPD crucial.
Interleukin-6 (IL-6) is a key member of the cytokine family, participating in cell-to-cell signaling and serving a crucial regulatory function in the immune system.IL-6 can accelerate the progression of lung inflammation (5).Bioinformatics analysis indicates that IL-6 is among the top ten hub genes in BPD, and both mRNA and protein levels of IL-6 are highly expressed in the peripheral blood of newborns with BPD (6).Umbilical cord blood (UCB) is non-invasive for newborns and can be collected early, making it a compelling choice for disease prediction.Previous studies have indicated a link between UCB IL-6 and the occurrence of BPD, suggesting it as a potential biomarker for predicting BPD (7)(8)(9).However, the relationship between UCB IL-6 and severity grade of BPD has yet to be investigated.
While numerous prediction tools exist for BPD, there is a scarcity of predictive models specifically addressing the severity grade of BPD (10).Most models still rely on the 2001 NICHD guidelines for assessing and grading BPD.Unfortunately, this standard is no longer applicable, failing to reflect the contemporary neonatal respiratory care practices such as the widespread use of high-flow nasal cannula (11).In 2019, a modern, evidence-based definition of BPD gained widespread recognition.This definition, established by the Neonatal Research Network (NRN), categorizes the grades of BPD based on the respiratory support patterns at PMA 36 weeks (12).When developing predictive models, artificial intelligence (AI) offers increased flexibility, allowing adaptation to diverse data types (13).AI models are particularly adept at capturing complex nonlinear relationships in data, a task that can be challenging for traditional models.In the realm of AI, machine learning (ML) is a key subfield that involves the automated identification of methods and parameters within data to find optimal solutions.Furthermore, ML models hold the potential to establish objective classification standards, thus enhancing the reliability and effectiveness of the models.ML models have emerged as promising predictive tools widely applied in clinical settings (14).
In this study, our aim is to explore the predictive value of UCB IL-6 for grading BPD and to establish effective early predictive models in a Chinese population using ML based on the 2019 NRN standards.

Patients
We conducted a retrospective cohort study, approved by the Ethics Committee of the First Affiliated Hospital of Zhengzhou University, with the ethics approval number: 2019-KY-95.All participants provided informed consent from their parents.The study's inclusion criteria were as follows: (1) gestational age (GA) at birth of less than 32 weeks, (2) admission to the NICU between January 2020 and December 2022 with survival until PMA 36 weeks, and (3) completion of UCB IL-6 testing within 24 h of admission.Exclusion criteria encompassed (1) significant congenital anomalies or chromosomal abnormalities, (2) death or discharge against medical advice before reaching PMA 36 weeks, and (3) incomplete data.The flow chart is shown in Figure 1.

Data collection and UCB IL-6 determinations
We retrospectively obtained clinical data from the hospital's electronic medical record system, including maternal factors, delivery characteristics, neonatal factors, respiratory support and initial laboratory tests.Respiratory support details encompassed initial modes of continuous positive airway pressure (CPAP), initial inspiratory oxygen concentration (FiO 2 ), and invasive ventilation.Initial laboratory tests covered UCB IL-6, N-terminal pro-brain natriuretic peptide (NT-proBNP), white blood cell count (WBC), pH value, chloride ion concentration, sodium ion concentration, and blood glucose levels, all collected within 24 h of admission.
The detection of UCB IL-6 concentration was performed using a multiplex microsphere flow cytometry assay (BD FACSCantoTM II flow cytometer manufactured by BD Company, Qingdao, China).The normal range for UCB IL-6 is 0-5.4 pg/ml.
Pregnancy induced hypertension is defined as systolic blood pressure of 140 mmHg or higher, diastolic blood pressure of 90 mmHg or higher, or the presence of both after 20 weeks of gestation (15).Gestational diabetes mellitus (GDM) refers to any degree of impaired glucose tolerance during pregnancy, regardless of whether diabetes was present prior to pregnancy (16).Preterm rupture of membranes (PROM) is defined as the rupture of fetal membranes before the onset of labor (17).Fetal growth restriction is defined as an estimated fetal weight or abdominal circumference below the 10th percentile for GA on ultrasound (18).According to the Fenton Chart, a birth weight (BW) at or below the 10th percentile of the average BW for the same GA is classified as small for gestational age (SGA) (19).All this data was obtained from the obstetric electronic medical records on the day of birth.
Respiratory distress syndrome (RDS) and intraventricular hemorrhage (IVH) are defined according to standard criteria (20,21).Respiratory failure (RF) was diagnosed based on clinical features and laboratory results (22).White matter injury (WMI) diagnosis involved the use of Doppler ultrasound imaging.Hemodynamically significant patent ductus arteriosus (HsPDA) is defined as a ductal diameter of 1.5 mm or larger, and the diagnosis is confirmed through echocardiography, which shows abnormal blood flow from the aorta to the pulmonary artery (23, 24).Neonatal sepsis (NS) diagnosis includes both culture-based and culture-independent methods (25).All the aforementioned clinical complications were collected within 7 days.

Definition of study outcomes 2.3.1. Primary outcome
In our study, the primary outcome focuses on diagnosing and grading BPD.According to the 2019 NRN evidence-based guidelines (12), severity-graded definitions of BPD are based on the mode of respiratory support at PMA 36 weeks, regardless of prior or current oxygen therapy: No BPD, no support; Grade 1 BPD, nasal cannula at flow rates ≤2 L/min; Grade 2 BPD, nasal cannula at flow rates >2 L/ min or noninvasive positive airway pressure; and Grade 3 BPD, invasive mechanical ventilation.Since there were only two cases of Grade 3 BPD, we combined them with the Grade 2 BPD cases, creating a single group referred to as Grade 2-3 BPD.

Secondary outcome
Furthermore, to explore the impact of different grades of BPD on clinical burden, we conducted supplementary analyses on length of hospital stay, total cost, and corrected age at discharge.Hospital length of stay is the entire duration from admission to discharge, measured in days.The total cost we obtained represents all expenses documented in the hospital's records during hospitalization.Corrected age at discharge was calculated by subtracting the number of weeks a child was born prematurely to her/his chronological age at the time of discharge to home (26).

Data analysis
For normally distributed numerical variables, we utilized mean ± SD for presentation.Conversely, for numerical variables that did not follow a normal distribution, we opted the median (P25-P75) for representation.Additionally, to meet the normal distribution assumptions, logarithmic transformations were applied to the values of UCB IL-6 and NT-proBNP using the natural logarithm (base e).Normally distributed variables were analyzed using analysis of variance (ANOVA), while non-normally distributed variables were assessed using the Kruskal-Wallis H test.For variables with significant P values, further pairwise comparisons were conducted.Categorical data were expressed as counts (%), and intergroup comparisons were performed using either the chisquare test or Fisher's exact test, as appropriate.Subsequently, the main objective was to assess the correlation between UCB IL-6 and the severity grades of BPD through logistic regression analysis.
Factors with P values <0.10 were selected for regression analysis.Univariate and multivariate ordinal logistic regression analyses were performed.Predictive models were established using all statistically significant risk factors (P < 0.05) identified through the multivariate analysis.

Model development
In the field of ML, ensemble learning algorithms are widely acknowledged as superior to traditional single-model methods.The two main branches of ensemble learning are boosting and bagging, with XGBoost, LightGBM, CatBoost, GBDT, and AdaBoost   Ten-fold cross-validation was utilized to train and evaluate our four models.The dataset was split into ten uniform partitions using stratified sampling, maintaining consistent proportions for each category within every partition, mirroring the original dataset.Each model underwent ten rounds of training and evaluation, with nine subsets used for training and one subset for testing in each iteration.This tenfold process ensured each subset was used for testing once.The final evaluation represented the average of these ten assessment rounds, providing a more stable assessment of model performance by reducing the impact of random data partitioning.To address the imbalance in sample sizes among different classes in our dataset, we applied balanced sampling to maintain consistent training across all categories.Additionally, we conducted a grid hyperparameter search to optimize the models' performance.

Model evaluation
We used Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) to assess the models' classification performance.We not only utilized Macro-average ROC for an overall evaluation but also employed Micro-average ROC to mitigate class imbalance, offering a more accurate reflection of the models' predictive capabilities.Apart from these metrics, we also utilized F1 score, precision, recall and accuracy to assess the models' performance.Furthermore, we calculated SHAP values and generated feature importance distribution plots to evaluate the significance of UCB IL-6 in each model.

Experiment setting
All our experiments were based on Python 3.8, utilizing the following libraries: scikit-learn 0.24.1,XGBoost, LightGBM, Catboost, and GridSearch.We set the random seed to 42.The trained models and inference evaluation codes have been uploaded to GitHub.You can click the following link to view them (https://github.com/1183371714/GBD).

Demographics and clinical burden
The research ultimately included 414 preterm infants, with merely two cases falling into the Grade 3 BPD category.All patients were divided into three groups: the No BPD group (n = 309), the Grade 1 BPD group (n = 73), and the Grade 2-3 BPD group (n = 32).The demographic characteristics and clinical burden of the patients are detailed in Table 1.It's clear that infants with more severe BPD tend to have a lower GA, and those in the No BPD group have the highest BW.Additionally, Grade 2-3 BPD infants have a higher proportion of males compared to the No BPD group.Notably, the analysis of differences in clinical burdens among the three groups indicates that groups with higher BPD grades experience heavier clinical burdens.
In the univariate analysis, infants with Grade 2-3 BPD were less likely to receive antenatal corticosteroid treatment.After birth, significant differences were observed among the different groups in Apgar scores and rates of asphyxia.Infants with a higher grade of BPD exhibited lower Apgar scores and a greater likelihood of requiring resuscitation for asphyxia.The risk factors during the antepartum and perinatal periods are listed in Table 2.
Patients with No BPD initially require the lowest FiO 2 and are most likely to receive initial breathing support through CPAP.Grade 2-3 BPD patients require the highest FiO 2 and are most likely to receive initial support through invasive ventilation.Compared to the No BPD group, Grade 2-3 BPD patients are more likely to receive surfactant therapy.Among the three groups, Grade 2-3 BPD patients have the highest levels of initial NT-proBNP and UCB IL-6, while having the lowest pH values.Children in the No BPD group have the lowest likelihood of experiencing pulmonary hemorrhage and HsPDA.The more severe the condition, the higher the likelihood of RF, pulmonary hypertension, and IVH.The early postnatal clinical characteristics are presented in Table 3.

The predictive value of UCB IL-6 levels for the severity grade of BPD
The comparison of UCB IL-6 levels among different BPD groups is illustrated in Figure 2. Significant differences (P <   0.05) were observed between any two groups, with higher UCB IL-6 levels in the groups with more severe BPD.UCB IL-6 levels were not effective in distinguishing between infants with Grade 1 BPD and those with No BPD.However, as illustrated in Figure 3, in ROC curve a, UCB IL-6 effectively distinguishes infants with Grade 2-3 BPD from those with No BPD, with an AUC of 0.815 (95% CI: 0.753-0.877).In ROC curve b, UCB IL-6 demonstrates predictive potential for distinguishing between less severe (Grade 0-1) and more severe (Grade 2-3) BPD, supported by an AUC of 0.800 (95% CI: 0.738-0.862).The optimal cutoff value for UCB IL-6 in both cases was 47.70 pg/ml.The sensitivity was 81.3% in both cases, with specificities of 70.2% and 68.3%, respectively.Within the subgroup of BPD patients, ROC curve c demonstrates UCB IL-6's predictive value for higher-grade BPD (Grade 2-3 BPD) with an AUC of 0.737 (95% CI: 0.642-0.831).The optimal cutoff value was 76.71 pg/ml, providing a sensitivity of 71.9% and specificity of 69.9%.In ROC curve d, the ROC curve demonstrates the predictive ability of UCB IL-6 for BPD occurrence.The AUC was calculated as 0.644 (95% CI: 0.582-0.706),with an optimal cutoff value of 90.92 pg/ml.Sensitivity and specificity were 41.9% and 81.2%, respectively.

Multivariable analysis
The results of the ordinal logistic regression analysis are presented in Table 4. Patients were categorized based on severity grade as follows: No BPD as 0, Grade 1 BPD as 1, and Grade 2-3 BPD as 2. According to the logistic regression analysis, UCB IL-6, NT-proBNP, pH value, Apgar scores at 1 and 5 min, perinatal asphyxia, initial FiO 2 , invasive ventilation, surfactant therapy, HsPDA, RF, and IVH were identified as independent risk factors for grading BPD.

Model performance and feature importance distribution
Twelve clinical factors with significant differences in multivariate analysis, as well as key factors (gender, GA and BW), were incorporated into ML models.ROC curves were derived through ten-fold cross-validation.The Micro-average AUROC values for the XGBoost, CatBoost, LightGBM, and Random Forest models were 0.841, 0.870, 0.851, and 0.878, respectively (Figure 4).Detailed metrics for these four models can be found in Supplementary Table 1.
Based on SHAP values, Figure 5 illustrates the feature importance distribution for each model.It is evident that UCB IL-6 is the most crucial feature across all models.

Discussion
We verified the predictive significance of UCB IL-6 for severitygraded BPD among a Chinese cohort, in accordance with the 2019 NRN guidelines.Based on the results of univariate analysis, UCB IL-6 levels exhibited an increase in tandem with the grades of BPD.Our study demonstrated that UCB IL-6 predicts the occurrence of BPD.Additionally, it predicts the presence of Grades 2-3 BPD when compared to either No BPD or Grade 1 BPD.What's more, ordinal logistic regression analyses indicated that UCB IL-6 is a predictive factor for grading BPD.UCB IL-6 predicts Grade 2-3 BPD better than Grade 1 BPD, as do the four ML models we developed.It was observed that the UCB IL-6 factor consistently The 2001 NICHD criteria for BPD are outdated due to advancements in respiratory support methods.The 2018 NICHD criteria appear to be somewhat complex.The 2019 NRN guidelines are easy and effective, and using them to assess the severity-graded BPD can provide the most accurate predictions regarding infant outcomes at 18-26 months of corrected age (12).Based on the 2019 NRN standards, it was found that infants classified with a higher grade BPD have a significantly higher risk of requiring tracheostomy or facing mortality compared to those with a lower grade of BPD (30).Early prediction of severity-graded BPD under the 2019 NRN standards is necessary, as it can predict clinical burden, guide treatment, and inform follow-up strategies.
Yan et al. (6) found that the higher the grade of BPD, the higher the concentration of peripheral blood IL-6.Cansu Yılmaz et al. (31) found that elevated levels of IL-6 in tracheal aspirates of newborns were associated with the severity grade of BPD.In contrast, our study focuses on UCB, obtained at an earlier time point, and The predictive value of umbilical cord blood IL-6 levels for the grade of BPD.AUC stands for area under the receiver operating characteristic curve; and BPD stands for bronchopulmonary dysplasia.ROC curve a: LnIL-6 levels for distinguishes infants with Grade 2-3 BPD from those with No BPD; ROC curve b: LnIL-6 levels for distinguishing between less severe (Grade 0-1) and more severe (Grade 2-3) BPD; ROC curve c: LnIL-6 levels' predictive value for higher-grade BPD (Grade 2-3 BPD) in patients with BPD; ROC curve d: LnIL-6 levels for prediction of BPD occurrence.

Characteristics
Crude odds ratio a 95% CI

P a value
Adjusted odds ratio b 95% CI P b value poses lower invasiveness to the neonates.IL-6 may play a pivotal role in the pathogenesis of BPD.One possible explanation is that activation of IL-6 signaling in macrophages by high oxygen impairs the homeostasis of alveolar type II epithelial cells and disrupts the formation of elastic fibers, thereby inhibiting lung growth (32).Additionally, studies suggest that the crosstalk between inflammation and cell death might be associated with oxygen-induced lung injury in BPD.Future therapeutic approaches for BPD should be centered on suppressing the expression of cytokines (33).Zhang et al. (34) revealed that BPD might be influenced by the inflammatory response to the gut microbiota.Further in-depth research and exploration are necessary to elucidate the specific underlying mechanisms.
Most predictive models traditionally relied on statistical methods.However, there is a recent surge in utilizing AI techniques to develop BPD prediction models.For example, a study integrated clinical data and genomics to construct a ML model for predicting BPD and severe BPD, achieving an AUC of 0.872 (35).In 2023, Wen He and colleagues developed multiple ML models for predicting BPD severity using clinical data, with the highest AUC reaching 0.86 (10).However, it's important to note that their criteria for diagnosing and grading BPD were based on the 2001 NICHD guidelines, which might not fully account for advances in respiratory support technology today, limiting the practical use of their models.In contrast, our models adhere to the more up-todate 2019 NRN standards, making them more suitable for clinical application.Furthermore, our models include new predictive factors -NT-proBNP and UCB IL-6.Notably, UCB IL-6, being noninvasive, consistently held the top rank in the feature importance distribution charts within all four models, signaling its significance.
Other than UCB IL-6, we observed that in our four models, the top five factors in terms of importance also include BW, GA, NT-proBNP, pH value, and FiO 2 .Factors that impede alveolar maturation, such as BW and GA, are considered crucial elements in the pathology of BPD.Previous studies have demonstrated that NT-proBNP levels can predict moderate-to-severe BPD or mortality (36)(37)(38).Given that, in this study, based on the 2019 NRN criteria, NT-proBNP remains a significant risk factor for grading BPD, further attention should be directed towards understanding how NT-proBNP influences infant lung development.According to our study, timely correction of premature infants' pH levels and prevention of acidosis are crucial.Additionally, we noticed high initial FiO 2 levels as an independent risk.One possible reason is that elevated oxygen triggers reactive oxygen species, harming the lungs (39).
However, our study has limitations.The small number of patients with Grade 3 BPD in our level III NICU forces us to combine it with Grade 2 BPD, representing a more severe condition.Therefore, we are unable to predict the likelihood of an infant developing grade 3 BPD specifically; instead, we can only predict whether the newborn is at a more severe level of BPD.Being retrospective, confounding factors weren't controlled.Additionally, our data were collected from a single center, potentially not representing all patients.Future research requires a prospective study with a larger sample size, multiple centers, and external validation to incorporate UCB IL-6 in predicting preterm infant outcomes.

Conclusion
Our research focused on preterm infants with GA less than 32 weeks.We developed four ML models and found that UCB IL-6 plays a crucial role in predicting the severity grade of BPD.

FIGURE 4
FIGURE 4The performance of various machine learning models through ten-fold cross-validation.(A-D) Display the ROC curves of the XGBoost, CatBoost, LightGBM, and RF models.Class 0 represents No BPD; class 1 represents Grade 1 BPD, and class 2 represents Grade 2-3 BPD.RF stands for random forest, ROC stands for receiver operating characteristic, AUC stands for area under the receiver operating characteristic curve, and BPD stands for bronchopulmonary dysplasia.
Gao et al. 10.3389/fped.2023.1301376belonging to boosting algorithms, while Random Forest represents the bagging method.AdaBoost, functioning as a standalone model, is limited to binary classification and cannot be employed for multiclass tasks.Notably, XGBoost, LightGBM, and Catboost are all improved versions of GBDT.Therefore, in this study, we employed the four most commonly used methods in similar research endeavors (27-29): XGBoost, LightGBM, Catboost, and RF.

TABLE 1
Demographic and clinical burden in different groups.

TABLE 2
Antenatal and birth characteristics in the patients among three groups.

TABLE 3
Early postnatal characteristics among three groups.

TABLE 4
Results of the ordinal logistic regression model for severitygraded BPD.